Web Table Classification Based on Visual Features

نویسندگان

چکیده

Tables on the web constitute a valuable data source for many applications, like factual search and knowledge base augmentation. However, as genuine tables containing relational only account small proportion of web, reliable table classification is crucial first step extraction. Previous works usually rely explicit feature construction from HTML code. In contrast, we propose an approach by exploiting full visual appearance table, which purely applying convolutional neural network rendered image table. Since these features can be extracted automatically, our circumvents need construction. A new hand labeled gold standard dataset code images 13,112 was generated this task. Transfer learning techniques are applied to well known VGG16 ResNet50 architectures. The evaluation CNN with fine tuned (F1 93.29%) shows that achieves results comparable previous solutions using explicitly defined based features. By combining features, F-measure 93.70% achieved Random Forest classification, beats current state art methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

Classification of News Web Documents Based on Structural Features

The motivation of this work comes from the need of a Thai web corpus for testing our information retrieval algorithm. Two collections of news web documents are gathered from two different Thai newspaper web sites. Our goal is to find a simple yet effective method to extract news articles from these web collections. We explore the use of machine learning methods to distinguish article pages from...

متن کامل

Hyperspectral Images Classification by Combination of Spatial Features Based on Local Surface Fitting and Spectral Features

Hyperspectral sensors are important tools in monitoring the phenomena of the Earth due to the acquisition of a large number of spectral bands. Hyperspectral image classification is one of the most important fields of hyperspectral data processing, and so far there have been many attempts to increase its accuracy. Spatial features are important due to their ability to increase classification acc...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

3D Classification of Urban Features Based on Integration of Structural and Spectral Information from UAV Imagery

Three-dimensional classification of urban features is one of the important tools for urban management and the basis of many analyzes in photogrammetry and remote sensing. Therefore, it is applied in many applications such as planning, urban management and disaster management. In this study, dense point clouds extracted from dense image matching is applied for classification in urban areas. Appl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-74296-6_15